fix(router): use max_completion_tokens for OpenAI GPT-5+ validation#575
Open
cluster2600 wants to merge 1 commit intoNVIDIA:mainfrom
Open
fix(router): use max_completion_tokens for OpenAI GPT-5+ validation#575cluster2600 wants to merge 1 commit intoNVIDIA:mainfrom
cluster2600 wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
|
All contributors have signed the DCO ✍️ ✅ |
|
Thank you for your interest in contributing to OpenShell, @cluster2600. This project uses a vouch system for first-time contributors. Before submitting a pull request, you need to be vouched by a maintainer. To get vouched:
See CONTRIBUTING.md for details. |
…robe OpenAI GPT-5 models reject the legacy max_tokens parameter and require max_completion_tokens. The inference validation probe now sends max_completion_tokens as the primary parameter, with an automatic fallback to max_tokens when the backend returns HTTP 400 (for legacy/self-hosted backends that only support the older parameter). Closes NVIDIA#517 Signed-off-by: Maxime Grenu <maxime.grenu@gmail.com>
3c89e9b to
44217f7
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Resolves #517 —
openshell inference setfails for OpenAI GPT-5 models because the validation probe sends the deprecatedmax_tokensparameter, which GPT-5+ rejects with HTTP 400.max_completion_tokensas the primary parameter in the OpenAI chat completions validation probemax_tokenswhen the backend returns HTTP 400 (for legacy or self-hosted backends)try_validation_request()helper to avoid duplicating the request/response classification logicRoot Cause
OpenAI introduced
max_completion_tokensas a replacement formax_tokensstarting with theo1series. GPT-5 and later models rejectmax_tokensentirely, returning HTTP 400. The validation probe only sentmax_tokens, so inference setup would fail for any GPT-5+ model even though the endpoint was perfectly healthy.graph TD subgraph "Before (broken)" A["validation_probe()"] -->|"max_tokens: 32"| B[OpenAI API] B -->|"HTTP 400: unsupported parameter"| C["ValidationFailure ❌"] end subgraph "After (fixed)" D["validation_probe()"] -->|"max_completion_tokens: 32"| E[OpenAI API] E -->|"HTTP 200"| F["ValidatedEndpoint ✅"] E -->|"HTTP 400"| G{fallback_body?} G -->|"yes"| H["retry with max_tokens: 32"] H -->|"HTTP 200"| I["ValidatedEndpoint ✅"] G -->|"no"| J["ValidationFailure ❌"] endChanges
crates/openshell-router/src/backend.rsfallback_bodyfield toValidationProbe; updateopenai_chat_completionsprobe to usemax_completion_tokenswithmax_tokensfallback; extracttry_validation_request()helper; add 3 new testscrates/openshell-server/src/inference.rsmax_tokenstomax_completion_tokensTest Plan
cargo test -p openshell-router— 11 passed, 0 failedverify_openai_chat_uses_max_completion_tokens— primary probe succeeds withmax_completion_tokensverify_openai_chat_falls_back_to_max_tokens— HTTP 400 on primary triggers retry withmax_tokensverify_non_chat_completions_no_fallback— non-chat protocols (e.g.anthropic_messages) do not retry on 400sequenceDiagram participant CLI as openshell inference set participant Router as Privacy Router participant Backend as OpenAI API CLI->>Router: verify_backend_endpoint() Router->>Backend: POST /v1/chat/completions<br/>{"max_completion_tokens": 32} alt GPT-5+ model Backend->>Router: HTTP 200 Router->>CLI: ValidatedEndpoint ✅ else Legacy backend Backend->>Router: HTTP 400 (unknown param) Router->>Backend: POST /v1/chat/completions<br/>{"max_tokens": 32} Backend->>Router: HTTP 200 Router->>CLI: ValidatedEndpoint ✅ end